Can you please release how you post-train qwen3 on deepseek?
#12 opened 2 days ago
by
ZeroWw
Tried it, but not good as expected.
1
#11 opened 3 days ago
by
kk3dmax
/no_think 标签不能用了吗
2
#10 opened 3 days ago
by
loong
Any plans for a Qwen3-32B model?
👍
9
7
#9 opened 3 days ago
by
wanghf
BTW For programmer, `Gemma` series are best to help you write comments, docstrings, and documents.
#8 opened 3 days ago
by
DOFOFFICIAL

DeepSeek-R1-Lite
❤️
🔥
16
5
#6 opened 3 days ago
by
Dampfinchen
generation_config.json is missing
👀
1
#5 opened 3 days ago
by
Doctor-Chad-PhD

Model broken
👍
3
7
#4 opened 3 days ago
by
sm54
Any plans on gemma series? ;-;
❤️
4
4
#2 opened 3 days ago
by
Nakdesu

Any plans on 30B-A3B model?
🔥
28
7
#1 opened 3 days ago
by
xxx777xxxASD
